Introduction

For our case study, the primary focus is to figure out which compound is the best biomarker for recent use while also deciphering which matrix is most optimal as well. In further detail, the compound, which could be THC, CBN, etc., falls within one of three matrices of the following: blood, breath, and oral fluid. Within the time frame of three hours or 180 minutes, we would like to figure out which compound and it’s respective matrix is the most potent so that we could extrapolate our findings to real life applications. The reason as to why this study is important is because accurately testing for marijuana usage is very important in different everyday scenarios. For example, THC in marijuana can affect an individuals motor skills, depth perception, and overall cognition. This then can hinder their ability to work effectively and safely. According to the National Institute on Drug Abuse, it is noted that employees that tested positive for marijuana had an increase of 55% when it came to workplace accidents and that they were responsible for another increase of 85% of work-related injuries 1. These liabilities can hurt the company and more importantly, the individual; hence why it is paramount for companies to run effective drug tests on their employees. Another example why finding out which compound and matrix is most effective to use when conducting a drug test is for scenarios in which we’d like to find out if a driver is under the influence or not. Quoted directly from the National Highway Traffic Safety Administration, “In the 2013-2014 survey 2, 12.6 percent of weekend nighttime drivers tested positive for marijuana. That’s a 48-percent increase in less than 10 years” 3.

All of this information is quite alarming and that is why we want to figure out which compound and matrix is the most effective to analyze when trying to figure out if an individual is under the influence or not. It is also worth mentioning that we’d like to go deeper with this study by trying to find out if some of these compounds are more sensitive to higher or lower doses of marijuana. The relationship between these variables can provide us with substantial information in regards to figuring out if some compounds are worth paying more attention to than others. This ultimately saves the tester a lot of time when they’re running a drug test on an individual.

Load packages

library(tidymodels)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(janitor)
library(purrr)
library(rstatix)
library(cowplot)

Question

  • Which compound, in which matrix, and at what cutoff is the best biomarker of recent use? (recent use is defined as 3h)
  • “do some compound respond more to high dose/low dose compared to the other?”

The Data

Data Import

WB = read.csv("data/Blood.csv")
BR = read.csv("data/Breath.csv")
OF = read.csv("data/OF.csv")

Data Wrangling

We re-coded and re-leveled variables (Treatment and Group), cleans column names, and renames specific columns (x11_oh_thc to thcoh, thc_v to thcv, thccooh_gluc to thc_cooh_gluc, and thccooh to thc_cooh) in all 3 tables. Using janitor package to organized column names.

**Suggestion: add a new column recording whether someone should be determined to have recent THC use (THC group, within 3 hr of smoking) or not(all placebo group, or THC group outside the 3hr window)

OF <- OF |>
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
         Group = fct_recode(Group, 
                            "Occasional user" = "Not experienced user",
                            "Frequent user" = "Experienced user" )) |>  
  janitor::clean_names() |>
  rename(thcoh = x11_oh_thc,
         thcv = thc_v)

WB <- WB |> 
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)")) |> 
  janitor::clean_names() |>
  rename(fluid = fluid_type,
         thcoh = x11_oh_thc,
         thccooh = thc_cooh,
         thccooh_gluc = thc_cooh_gluc,
         thcv = thc_v)

BR <- BR |> 
  mutate(Treatment = fct_recode(Treatment, 
                                "5.9% THC (low dose)" = "5.90%",
                                "13.4% THC (high dose)" = "13.40%"),
         Treatment = fct_relevel(Treatment, "Placebo", "5.9% THC (low dose)"),
         Group = fct_recode(Group, 
                            "Occasional user" = "Not experienced user",
                            "Frequent user" = "Experienced user" )) |> 
  janitor::clean_names() |> 
  rename(thc = thc_pg_pad)


compounds_WB <-  as.list(colnames(Filter(function(x) !all(is.na(x)), WB[6:13])))
compounds_BR <-  as.list(colnames(Filter(function(x) !all(is.na(x)), BR[6])))
compounds_OF <-  as.list(colnames(Filter(function(x) !all(is.na(x)), OF[6:12])))

Created 3 tables based on specific minutes and labeled accordingly, covering pre-smoking and subsequent post-smoking time periods for blood, breath, and oral fluid data.

timepoints_WB <- tibble(
  start = c(-400, 0, 30, 70, 100, 180, 210, 240, 270, 300),
  stop = c(
    0,
    30,
    70,
    100,
    180,
    210,
    240,
    270,
    300,
    max(WB$time_from_start, na.rm = TRUE)
  ),
  timepoint = c(
    "pre-smoking",
    "0-30 min",
    "31-70 min",
    "71-100 min",
    "101-180 min",
    "181-210 min",
    "211-240 min",
    "241-270 min",
    "271-300 min",
    "301+ min"
  )
)

timepoints_BR <- tibble(
  start = c(-400, 0, 40, 90, 180, 210, 240, 270),
  stop = c(
    0,
    40,
    90,
    180,
    210,
    240,
    270,
    max(BR$time_from_start, na.rm = TRUE)
  ),
  timepoint = c(
    "pre-smoking",
    "0-40 min",
    "41-90 min",
    "91-180 min",
    "181-210 min",
    "211-240 min",
    "241-270 min",
    "271+ min"
  )
)

timepoints_OF <- tibble(
  start = c(-400, 0, 30, 90, 180, 210, 240, 270),
  stop = c(0, 30, 90, 180, 210, 240, 270,
           max(OF$time_from_start, na.rm = TRUE)),
  timepoint = c(
    "pre-smoking",
    "0-30 min",
    "31-90 min",
    "91-180 min",
    "181-210 min",
    "211-240 min",
    "241-270 min",
    "271+ min"
  )
)

assign_timepoint <- function(x, timepoints) {
  if (!is.na(x)) {
    timepoints$timepoint[x > timepoints$start & x <= timepoints$stop]
  } else{
    NA
  }
}

We created a new column, timepoint_use, in each table by mapping the time_from_start values to specific timepoints defined in separate reference data frames (timepoints_WB, timepoints_OF, timepoints_BR). Finally, re-leveled the timepoint_use factor variable to align with the order specified in the reference data frames. This ensures consistent and meaningful timepoint labels for subsequent analyses or visualizations in the study.

 WB <- WB |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_WB),
         timepoint_use = fct_relevel(timepoint_use, timepoints_WB$timepoint))

OF <- OF |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_OF),
         timepoint_use = fct_relevel(timepoint_use, timepoints_OF$timepoint))

BR <- BR |> 
  mutate(timepoint_use = map_chr(time_from_start, 
                                 assign_timepoint, 
                                 timepoints=timepoints_BR),
         timepoint_use = fct_relevel(timepoint_use, timepoints_BR$timepoint))

remove duplicate id

WB <- drop_dups(WB)
OF <- drop_dups(OF)
BR <- drop_dups(BR)
<<<<<<< HEAD

Exploratory Data Analysis

compounds measurements over time by treatment

=======

Analysis

Exploratory Data Analysis

compounds measurements over time by treatment

>>>>>>> 071f6c9873b3f21eaaccd36864855281bd9c697e

The following plots include of all compounds against time, distinguished by color according to their respective groups. To achieve a comprehensive understanding, we generated scatterplots for compounds across three distinct matrices—namely, whole blood, oral fluid, and breath. This analysis encompasses various timepoints and considers different treatments, namely, placebo, low dose, and high dose.

Upon close examination of the scatterplots, a noteworthy observation emerges, particularly concerning the THC biomarker in whole blood. This specific biomarker appears to offer a potentially enhanced indication of recent cannabis joint usage. The scatterplot reveals a discernible separation between the placebo and THC treatment groups, suggesting that the THC measurement in whole blood may serve as a more reliable indicator of recent cannabis joint consumption.

scatter_WB <- map(compounds_WB, ~ compound_scatterplot_group_by_treatment( 
    dataset=WB, 
    compound=.x, 
    timepoints=timepoints_WB))

scatter_OF <- map(compounds_OF, ~ compound_scatterplot_group_by_treatment( 
    dataset=OF, 
    compound=.x, 
    timepoints=timepoints_OF))

scatter_BR <- map(compounds_BR, ~ compound_scatterplot_group_by_treatment( 
    dataset=BR, 
    compound=.x, 
    timepoints=timepoints_BR))

In the presented set of scatterplots, all compounds are graphically depicted against time, with color distinctions denoting different treatment conditions and a log transformation applied to the y-axis, which represents the respective compound measurements. A comparative analysis with the previous scatterplots reveals a modification: specifically, a log transformation has been applied to the y-axis, providing an alternative perspective on the measurement of the compounds.

Upon closer examination, a notable observation emerges. The measurement of THC from breath exhibits a more discernible separation between the placebo and THC treatment groups in the log-transformed scatterplots. This suggests that the log transformation on the y-axis enhances the visibility of distinctions between the treatment conditions for THC. The log transformation, by compressing the scale, may unveil nuances and patterns that are not as apparent on a linear scale. This nuanced insight into THC measurements underscores the importance of considering the impact of transformation techniques when analyzing compound data over time in the context of different treatments. The enhanced separation observed in the log-transformed scatterplots could potentially provide valuable insights into the effects of treatments on THC levels and underscores the sensitivity of the chosen visualization approach.

scatter_WB_by_treatment <- map(compounds_WB, ~ compound_scatterplot_group_by_treatment_log( 
    dataset=WB, 
    compound=.x, 
    timepoints=timepoints_WB))

scatter_OF_by_treatment <- map(compounds_OF, ~ compound_scatterplot_group_by_treatment_log( 
    dataset=OF, 
    compound=.x, 
    timepoints=timepoints_OF))

scatter_BR_by_treatment <- map(compounds_BR, ~ compound_scatterplot_group_by_treatment_log( 
    dataset=BR, 
    compound=.x, 
    timepoints=timepoints_BR))

deleting compounds that obviously do not work from the compound data frame WB: cbd, thccooh, thccooh_gluc, thcv OF:thcoh

compounds_WB = compounds_WB[- c(2, 5, 6, 8)]
compounds_OF = compounds_OF[- c(4)]

Analysis

Calculating sensitivity and specificity.

output_WB <- map_dfr(compounds_WB,
                     ~ sens_spec_cpd(
                       dataset = WB,
                       cpd = all_of(.x),
                       timepoints =  timepoints_WB
                     )) |> clean_gluc()

output_BR <- map_dfr(compounds_BR, 
                     ~ sens_spec_cpd(
                       dataset = BR,
                       cpd = all_of(.x),
                       timepoints = timepoints_BR
                     ))  |> clean_gluc()

output_OF <- map_dfr(compounds_OF,
                     ~ sens_spec_cpd(
                       dataset = OF,
                       cpd = all_of(.x),
                       timepoints = timepoints_OF
                     ))  |> clean_gluc()

cutoff vs. sensitivity/specificity

Here we plot the value of the cutoff against sensitivity and specificity for every compound in every matrix, and arrange them all into one big plot. This is also known as the ROC curve of sensitivity and specificity against cutoff values suggests an exploration of optimal cutoff points. Overall, the specificity of all compounds increases when detection limit rises. On the other hand, sensitivity drops to zero when detection limit rises.

#arranges ss plots into one
ss_bottom_row <-
  plot_grid(
    ss_OF,
    ss_BR,
    labels = c('B', 'C'),
    label_size = 12,
    ncol = 2,
    rel_widths = c(0.66, .33)
  )
plot_grid(
  ss_WB,
  ss_bottom_row,
  labels = c('A', ''),
  label_size = 12,
  ncol = 1
)

####Average sensitivity and specificity vs. detection limit

output_WB_avg = average_sens_spec(output = output_WB)
output_OF_avg = average_sens_spec(output = output_OF)
output_BR_avg = average_sens_spec(output = output_BR)


ss_WB_avg_together <-
  ss_plot_avg_together(output_WB_avg, tpts = length(unique(output_WB$time_start)), tissue = "Blood")

ss_OF_avg_together <-
  ss_plot_avg_together(output_OF_avg, tpts = length(unique(output_WB$time_start)), tissue = "Oral Fluid")

ss_BR_avg_together <-
  ss_plot_avg_together(output_BR_avg, tpts = length(unique(output_WB$time_start)), tissue = "Breath")

It should be apparent that OF-THC is the superior choice. now we dig deeper into OF-THC and find the specific cutoff. referring back to the Average sensitivity and specificity vs. detection limit plot, we see that the detection limit is at…very close to 0 when both sensitivity and specificity are high. Let’s try out some more cutoffs close to 0.

i will now remove every compound where the average sens and spec does not intersect. reasoning: for compounds with no intersection, optimal sensitivity (left most point of the graph) = worst specificity. there is no room for adjustment because any adjustment from there on would just make everything worse.

compounds_WB = c("thc")
compounds_OF = c("thc")
compounds_BR = NULL
<<<<<<< HEAD

sensitivity vs. specificity

Here we plot sensitivity vs. specificity for every compound in every matrix, and arrange them all into one big plot

=======

sensitivity vs. specificity

In this visual representation, we graph the sensitivity against specificity for each compound within every matrix, consolidating the data into a comprehensive plot. This collective visualization allows for a convenient comparison of the performance of various biomarkers concerning their specificity and sensitivity.

>>>>>>> 071f6c9873b3f21eaaccd36864855281bd9c697e
output_WB <- map_dfr(compounds_WB,
                     ~ sens_spec_cpd(
                       dataset = WB,
                       cpd = all_of(.x),
                       timepoints =  timepoints_WB
                     )) |> clean_gluc()


output_OF <- map_dfr(compounds_OF,
                     ~ sens_spec_cpd(
                       dataset = OF,
                       cpd = all_of(.x),
                       timepoints = timepoints_OF
                     ))  |> clean_gluc()

#plot sensitivity vs. specificity
roc_WB = roc_plot(output_WB, tpts = length(unique(output_WB$time_start)), tissue = "Blood")

roc_OF = roc_plot(output_OF, tpts = length(unique(output_OF$time_start)), tissue = "Oral Fluid")

# #arrange roc plots
# roc_bottom_row <-
#   plot_grid(
#     roc_OF,
#     roc_BR,
#     labels = c('B', 'C'),
#     label_size = 12,
#     ncol = 2,
#     rel_widths = c(0.66, .33)
#   )
# plot_grid(
#   roc_WB,
#   roc_bottom_row,
#   labels = c('A', ''),
#   label_size = 12,
#   ncol = 1
# )

It should be apparent that OF-THC is the superior choice. now we dig deeper into OF-THC and find the specific cutoff. referring back to the Average sensitivity and specificity vs. detection limit plot, we see that the detection limit is at…very close to 0 when both sensitivity and specificity are high. Let’s try out some more cutoffs close to 0.

<<<<<<< HEAD

plot sensitivity and speciticity over time given specific cutoffs

=======

plot sensitivity and specificity over time given specific cutoffs

Taking a deeper dive into sensitivity and specificity over time over time for the measurement of THC in Blood and Oral Fluid tissues. In direct comparison between the two measurement methods of THC, it becomes evident that Oral Fluid outshines its counterpart in terms of both sensitivity and specificity, particularly within the critical time span of three hours post-smoking.

>>>>>>> 071f6c9873b3f21eaaccd36864855281bd9c697e
#pass specific cutoff into splits parameter
OF_THC <- sens_spec_cpd(
  dataset = OF,
  cpd = 'thc',
  timepoints = timepoints_OF,
  splits =  c(0.5, 1, 2, 5, 10)
) |> clean_gluc()

of_levels <- c("pre-smoking\nN=192", "0-30\nmin\nN=192", "31-90\nmin\nN=117",
               "91-180\nmin\nN=99", "181-210\nmin\nN=102", "211-240\nmin\nN=83",
               "241-270\nmin\nN=90",  "271+\nmin\nN=76")

plot_cutoffs(dataset=OF_THC, 
             timepoint_use_variable=OF$timepoint_use, 
             tissue="Oral Fluid", 
             cpd="THC", 
             x_labels=NULL)
## [[1]]

## 
## [[2]]
## # A tibble: 40 × 18
##       TP    FN    FP    TN detection_limit compound time_start time_stop
##    <dbl> <dbl> <int> <int> <fct>           <chr>         <dbl>     <dbl>
##  1     0     0    35   157 0.5             THC            -400         0
##  2     0     0    20   172 1               THC            -400         0
##  3     0     0     9   183 2               THC            -400         0
##  4     0     0     0   192 5               THC            -400         0
##  5     0     0     0   192 10              THC            -400         0
##  6   129     0    39    24 0.5             THC               0        30
##  7   129     0    30    33 1               THC               0        30
##  8   128     1    19    44 2               THC               0        30
##  9   128     1     3    60 5               THC               0        30
## 10   125     4     1    62 10              THC               0        30
## # ℹ 30 more rows
## # ℹ 10 more variables: time_window <fct>, NAs <int>, N <int>, N_removed <int>,
## #   Sensitivity <dbl>, Specificity <dbl>, PPV <dbl>, NPV <dbl>,
## #   Efficiency <dbl>, my_label <fct>

the average sensitivity is a lot more sensitive (ha) to change than the average specificity - specificity only dips in the 31-90min window when the cutoff is lowered, whereas a lower cutoff increases overall sensitivity all across the board, no matter the time.

in a nutshell: a low cutoff is optimal. approxiamately somewhere between 0-2. let’s test more cutoffs in this range:

<<<<<<< HEAD
plot_cutoffs(dataset=OF_THC, 
=======
OF_THC <- sens_spec_cpd(
  dataset = OF,
  cpd = 'thc',
  timepoints = timepoints_OF,
  splits =  c(0.1, 0.25, 0.5, 1, 1.5)
) |> clean_gluc()

blood_levels <- c("pre-smoking\nN=189", "0-30\nmin\nN=187", "31-70\nmin\nN=165",
                  "71-100\nmin\nN=157", "101-180\nmin\nN=168", "181-210\nmin\nN=103",
                  "211-240\nmin\nN=127", "241-270\nmin\nN=137", "271-300\nmin\nN=120",
                  "301+\nmin\nN=88")

of_levels <- c("pre-smoking\nN=192", "0-30\nmin\nN=192", "31-90\nmin\nN=117",
               "91-180\nmin\nN=99", "181-210\nmin\nN=102", "211-240\nmin\nN=83",
               "241-270\nmin\nN=90",  "271+\nmin\nN=76")

plot_cutoffs(dataset=OF_THC, 
>>>>>>> 071f6c9873b3f21eaaccd36864855281bd9c697e
             timepoint_use_variable=OF$timepoint_use, 
             tissue="Oral Fluid", 
             cpd="THC", 
             x_labels=NULL)
## [[1]]
<<<<<<< HEAD

=======

>>>>>>> 071f6c9873b3f21eaaccd36864855281bd9c697e
## 
## [[2]]
## # A tibble: 40 × 18
##       TP    FN    FP    TN detection_limit compound time_start time_stop
##    <dbl> <dbl> <int> <int> <fct>           <chr>         <dbl>     <dbl>
##  1     0     0    35   157 0.5             THC            -400         0
##  2     0     0    20   172 1               THC            -400         0
##  3     0     0     9   183 2               THC            -400         0
##  4     0     0     0   192 5               THC            -400         0
##  5     0     0     0   192 10              THC            -400         0
##  6   129     0    39    24 0.5             THC               0        30
##  7   129     0    30    33 1               THC               0        30
##  8   128     1    19    44 2               THC               0        30
##  9   128     1     3    60 5               THC               0        30
## 10   125     4     1    62 10              THC               0        30
## # ℹ 30 more rows
## # ℹ 10 more variables: time_window <fct>, NAs <int>, N <int>, N_removed <int>,
## #   Sensitivity <dbl>, Specificity <dbl>, PPV <dbl>, NPV <dbl>,
## #   Efficiency <dbl>, my_label <fct>

they all look pretty promising… we need a way to quantify this. I am gonna calculate the sensitivity and specificity for cutoff values in between 0 and 2.

output_OF = sens_spec_cpd_OFTHC(
                       dataset = OF,
                       cpd = "thc",
                       timepoints = timepoints_OF
                     )  |> clean_gluc()

output_OF_avg = average_sens_spec(output = output_OF)

output_OF_avg
## # A tibble: 101 × 4
##    compound detection_limit average_sensitivity average_specificity
##    <chr>              <dbl>               <dbl>               <dbl>
##  1 THC                 0                  0.956               0    
##  2 THC                 0.02               0.956               0.817
##  3 THC                 0.04               0.956               0.817
##  4 THC                 0.06               0.956               0.817
##  5 THC                 0.08               0.956               0.817
##  6 THC                 0.1                0.956               0.817
##  7 THC                 0.12               0.956               0.817
##  8 THC                 0.14               0.956               0.817
##  9 THC                 0.16               0.956               0.817
## 10 THC                 0.18               0.956               0.817
## # ℹ 91 more rows

lets plot this really quick

ss_OF_avg_together <-
  ss_plot_avg_together(output_OF_avg, tpts = length(unique(output_WB$time_start)), tissue = "Oral Fluid")

oh we found it. the place where they intersect is the maximum of sensitivity+specificity. lets get the specific value

output_OF_avg |>
  filter(abs(average_sensitivity-average_specificity) < 0.01) |>
  mutate(diff = abs(average_sensitivity-average_specificity))
## # A tibble: 5 × 5
##   compound detection_limit average_sensitivity average_specificity    diff
##   <chr>              <dbl>               <dbl>               <dbl>   <dbl>
## 1 THC                 0.82               0.893               0.890 0.00338
## 2 THC                 0.84               0.893               0.890 0.00338
## 3 THC                 0.86               0.893               0.890 0.00338
## 4 THC                 0.88               0.893               0.890 0.00338
## 5 THC                 0.9                0.893               0.890 0.00338

at cutoff 0.82-0.90, the difference between the average sensitivity and average specificity is minimized. we’ll pick 0.85 for aestheicism’s sake.

<<<<<<< HEAD

Results & Discussion

the optimal biomarker is OF,THC, at cutoff = 0.85.

Extended Question

Extended Question: wrangling

=======

conclusion

the optimal biomarker is OF,THC, at cutoff = 0.85.

extended question

wrangling

>>>>>>> 071f6c9873b3f21eaaccd36864855281bd9c697e
WB_long = WB |>
  pivot_longer(6:13, names_to = "compound")

OF_long = OF |>
  pivot_longer(6:12, names_to = "compound")

BR_long <- BR |> pivot_longer(6)

df_full <- bind_rows(WB_long, OF_long, BR_long)

###part 2: BUT i have a different extended questgion we can look into

This one is SUPER interesting. Some of the pairplots we went over in lecture notes 12 looked like they has two separate lines, so I thought one of the variables might change how compounds correlate with each other - and they do! Namely, the low dose and high dose group seems to have different slopes when it comes to the correlation between certain compounds.

Here’s a few of the more obvious ones:

ggplot(data = WB,
       aes(y = cbg, x = thc, color = treatment)) +
  geom_point(alpha = 0.5) +
  geom_smooth(
    method = "lm",
    # Use linear regression
    formula = y ~ x,
    # Specify the formula for the linear model
    se = FALSE,
    # Don't show the confidence interval
    data = WB |> filter(treatment == "5.9% THC (low dose)"),
    # Filter for low dose
    color = "green"
  ) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    se = FALSE,
    data = WB |> filter(treatment == "13.4% THC (high dose)"),
    # Filter for high dose
    color = "blue"
  ) +
  facet_wrap(~ group, scales = "free")

ggplot(data = OF,
       aes(y = thc, x = cbg, color = treatment)) +
  geom_point(alpha = 0.5) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    se = FALSE,
    data = OF |> filter(treatment == "5.9% THC (low dose)"),
    color = "green"
  ) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    se = FALSE,
    data = OF |> filter(treatment == "13.4% THC (high dose)"),
    color = "blue"
  ) +
  facet_wrap( ~ group, scales = "free")

ggplot(data = OF,
       aes(y = cbn, x = cbg, color = treatment)) +
  geom_point(alpha = 0.5) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    se = FALSE,
    data = OF |> filter(treatment == "5.9% THC (low dose)"),
    color = "green"
  ) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    se = FALSE,
    data = OF |> filter(treatment == "13.4% THC (high dose)"),
    color = "blue"
  ) +
  facet_wrap( ~ group, scales = "free")

ggplot(data = OF,
       aes(y = cbg, x = thcv, color = treatment)) +
  geom_point(alpha = 0.5) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    se = FALSE,
    data = OF |> filter(treatment == "5.9% THC (low dose)"),
    color = "green"
  ) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    se = FALSE,
    data = OF |> filter(treatment == "13.4% THC (high dose)"),
    color = "blue"
  ) +
  facet_wrap( ~ group, scales = "free")
<<<<<<< HEAD

this seems to show that even though low dose/high dose doesn’t effect chemical concentration overall (as shown in the plots in lecture notes), they do change the correlation between chemicals. Maybe a potention extended question would be “do some compound repsond more to high dose/low dose compared to the other?”

=======

In the visualization, we systematically analyzed each compound’s correlation patterns under low and high doses to discern whether particular compounds showcase distinct responses to dosage variations. We fit 2 linear models on low dose group and high dose group respectivly. Quantifying the degree of change in correlation coefficients for each compound under different dosage conditions. This approach could provide insights into the magnitude of variations in compound interactions.

WB_low <- WB |>
  filter(treatment == "5.9% THC (low dose)")
WB_high <- WB |>
  filter(treatment == "13.4% THC (high dose)")

#linear regression `cbg ~ thc` for 5.9% THC (low dose) group
linear_reg() |>
  set_engine("lm") |>
  fit(cbg ~ thc, data = WB_low) |>
  tidy()
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  -0.139   0.0142       -9.77 1.09e- 20
## 2 thc           0.0455  0.000534     85.2  1.62e-291
#linear regression `cbg ~ thc` for 13.4% THC (high dose) group
linear_reg() |>
  set_engine("lm") |>
  fit(cbg ~ thc, data = WB_high) |>
  tidy()
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  -0.0562  0.00734      -7.66 1.08e- 13
## 2 thc           0.0160  0.000448     35.7  4.01e-136
OF_low <- OF |>
  filter(treatment == "5.9% THC (low dose)")
OF_high <- OF |>
  filter(treatment == "13.4% THC (high dose)")

#linear regression `cbn ~ cbg` for 5.9% THC (low dose) group
linear_reg() |>
  set_engine("lm") |>
  fit(cbn ~ cbg, data = OF_low) |>
  tidy()
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)     6.35    1.98        3.22 1.43e-  3
## 2 cbg             1.59    0.0211     75.1  1.02e-208
#linear regression `cbn ~ cbg` for 13.4% THC (high dose) group
linear_reg() |>
  set_engine("lm") |>
  fit(cbn ~ cbg, data = OF_high) |>
  tidy()
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)     6.20    3.94        1.57 1.17e-  1
## 2 cbg             4.47    0.0791     56.5  3.01e-163
#linear regression `cbg ~ thcv` for 5.9% THC (low dose) group
linear_reg() |>
  set_engine("lm") |>
  fit(cbg ~ thcv, data = OF_low) |>
  tidy()
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)    -3.83     1.64      -2.34 2.01e-  2
## 2 thcv            8.33     0.151     55.0  1.47e-167
#linear regression `cbg ~ thcv` for 13.4% THC (high dose) group
linear_reg() |>
  set_engine("lm") |>
  fit(cbg ~ thcv, data = OF_high) |>
  tidy()
## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)    -1.16    0.352      -3.31 1.05e-  3
## 2 thcv            3.68    0.0261    141.   3.29e-279

Results/Discussion

During our thorough exploratory data analysis, a discernible pattern emerged, highlighting the potency of the THC compound in effectively indicating recent marijuana usage. Consequently, our focus in the data analysis section specifically hones in on the sensitivity and specificity cutoff measurements within the Blood and Oral Fluid tissues. Notably, Oral Fluid consistently exhibits superior sensitivity across all timepoints. By scrutinizing the Receiver Operating Characteristic (ROC) curve comparing the THC measurements in Blood and Oral Fluid, a clear trend emerges – Oral Fluid surpasses Blood in accuracy. In simpler terms, Oral Fluid proves more adept at detecting recent marijuana joint usage compared to its Blood counterpart. This confirmation underscores the THC measurement in Oral Fluid as the paramount biomarker for recent use.

Moving forward, our focus shifts to determining the optimal cutoff values for both sensitivity and specificity. To achieve this, we calculate the average sensitivity and specificity across all time windows for various detection limits, identifying the point at which these metrics intersect – a key indicator of the optimal cutoff. Upon plotting the graph and meticulously examining the associated table, a convergence becomes evident at detection limits ranging from 0.82 to 0.90. Hence, it becomes apparent that the 0.82 to 0.90 detection limit of the THC compound in Oral Fluid stands out as the most effective biomarker for recent use.

Moreover, our exploration extends to a broader question. The visualizations shed light on the distinct responses of various compounds to marijuana dosage. Notably, we focus on key comparisons: CBG with THC in Blood, CBN with CBG in Oral Fluid, and CBG with THCV in Oral Fluid. The analysis of CBG with THC in Blood reveals a noteworthy observation – the coefficient of THC in the low dose group significantly exceeds that in the high dose group. This implies that, for the low dose group, each increment in THC correlates more strongly with CBG compared to the high dose group. Shifting attention to the comparison between CBN with CBG in Oral Fluid, the coefficient of CBG in the high dose group is notably higher. This suggests that as the dosage increases, a heightened correlation emerges between CBG and CBN. These nuanced insights deepen our understanding of compound interactions in response to varying marijuana doses.

Conclusion

The results of our case study on biomarkers of recent marijuana use reveal intriguing insights into the choice of compound, matrix, and cutoff for effective detection. The overarching goal was to identify the most potent biomarker that, when combined with a specific matrix and cutoff, can reliably indicate recent marijuana use within a three-hour timeframe.

Primary Biomarker: THC in Oral Fluid

Our analysis consistently points to THC in oral fluid as the optimal biomarker for recent marijuana use. The scatterplots depicting compound measurements over time clearly show a distinct separation between the placebo and THC treatment groups, particularly in oral fluid. The ROC curve analysis further supports this conclusion, with oral fluid THC demonstrating superior sensitivity across all time points compared to other matrices. The enhanced sensitivity in oral fluid makes it a compelling choice for detecting recent marijuana use accurately.

Optimal Cutoff: 0.85

To determine the most effective cutoff for sensitivity and specificity, we conducted a thorough analysis, revealing that a cutoff value between 0.82 and 0.90 maximizes the balance between sensitivity and specificity. We opted for 0.85 as the optimal cutoff, considering the intersection point where the difference between average sensitivity and specificity is minimized. This ensures a practical and balanced approach for identifying recent marijuana use.

Extended Question: Dose-Response Relationships

Our exploration delved beyond the primary question to investigate dose-response relationships among different compounds. Notably, the correlation patterns between compounds varied under low and high dosage conditions. For instance, the correlation between THC and CBG in blood showed a more significant increase in the low dose group, indicating a potential sensitivity of certain compounds to dosage variations. This prompts an interesting follow-up question: “Do some compounds respond more to high dose/low dose compared to others?” Exploring this question could unveil nuanced insights into the interplay of different cannabinoids under varying dosage conditions.

In conclusion, while our study provides valuable insights into recent marijuana use detection, continuous research and validation are crucial for translating these findings into practical applications, especially in fields such as workplace drug testing and law enforcement.

>>>>>>> 071f6c9873b3f21eaaccd36864855281bd9c697e

    1. Marijuana at Work: What Employers Need to Know. NSC. https://www.nsc.org/nsc-membership/marijuana-at-work#:~:text=According%20to%20a%20study%20reported,Decreased%20productivity
    ↩︎
  1. Research Note: Results of the 2013-2014 National Roadside Survey of Alcohol and Drug Use by Drivers. NHTSA. https://www.nhtsa.gov/sites/nhtsa.gov/files/812118-roadside_survey_2014.pdf↩︎

  2. Drug-Impaired Driving. NHTSA. https://www.nhtsa.gov/risky-driving/drug-impaired-driving#:~:text=In%202007%2C%20NHTSA’s%20National%20Roadside,in%20less%20than%2010%20years.↩︎